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S(N, m) indicates a size of spectrum of m frequency channels for n frames. 
The spectrum average value of the frames is derived from the following 
equation. 

15 

S=l/16 £ S(n, 1) 
1=0 

Here, S(n) indicates an average value of spectrum of the n-th frame. 
The spectral dispersion of this frame can be calculated by subtracting the 
spectrum average value from the spectrum of each frequency channel of the 
frame and squaring the subtraction result. 

15 

V(n)= 2 (S(n, 1) - S(n))2 
1=0 

V(n) indicates the spectral dispersion of the n-th frame. FIG. 3 
shows the spectrums of four inputted audio signals /d/, /s/, /a/, and /silence/. 
Out of the four audios, the silence spectrum is smoother than the other 
spectrums, so that the silence spectral dispersion is smaller than the other 
three audio spectral dispersions. Based on this characteristic, audio can be 
separated from background signals. 

Reference numeral 71 is a spectral dispersion threshold calculator for 
calculating a spectral dispersion threshold by using spectral dispersions of 
some frames of an inputted audio, from the following equation. 

15 

VTH = (S V(n)) * 1.5 
n=l 

VTH here indicates a spectral dispersion threshold based on 10 
frames, and V(n) indicates a spectral dispersion of the n-th frame within a 
silence period. 

Reference numeral 81 is an audio clip detector for detecting an audio 
start point and an audio end point. By comparing the energy extracted by 
the energy extractor 20 and the spectral dispersion extracted by the spectral 
dispersion extractor 61 with the energy threshold calculated by the energy 
threshold calculator 30 and the spectral dispersion threshold calculated by 
the spectral dispersion threshold calculator 71, respectively, it is determined 
whether the frame is an audio start point. 
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(54) VOICE SECTION DETECTING DEVICE 

(57)Abstract: 

PURPOSE: To detect even a voice of a frictional sound by 
detecting a start point and an end point of a voiced section of an 
input voice by a dynamic feature of a voice spectral dispersion and 
energy. 

CONSTITUTION: The device is provided with an energy extracting 
part 20 for extracting the energy setting digital voice data in a 
prescribed section as one frame, an energy threshold calculating 
part 30 for adjusting an average value of background noise energy 
of a frame as an energy threshold, a spectral dispersion extracting 
part 61 for calculating a spectral dispersion by deriving an average 
value of a spectrum of the frame by a frequency of the frame, and 
a spectral dispersion threshold calculating part 71 for adjusting a 
spectral dispersion average value of a background noise as a 
spectral dispersion threshold. In this state, the energy and the 
spectral dispersion of each frame are compared with the energy 
threshold and the spectral dispersion threshold and whether it is a 
start point of a voice section or an end point is checked. In such a 
way, a voice start point of weak energy such as a frictional sound 
can be detected. 
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